📘 Introduction

Analytically reproducible documents

Author
Affiliation

Hélène Langet

Swiss TPH Research-IT

Published

January 24, 2025

1 Research = a dynamic process

  • Research insights are typically disseminated through reports (e.g., scientific presentations, publications, etc), including a textual narrative detailing the research context, methods, and key findings, often supplemented with figures and tables to summarize results, and a final discussion, with findings serving as evidence to support conclusions and recommendations ;
  • Research is an iterative and dynamic process, meaning there are no final or definitive results or reports ;
  • In addition, we continuously build upon the work of others to generate new insights and discoveries.

Image from Jorge Cham (PhDComics)

2 Reproducibility in research

3 Analytically reproducible documents

These documents typically contain 3 main types of content, integrating code and natural language in a way that is called “literate programming”.

These are languages that can be written using any plain text editor. They use markup elements to define how text should be displayed or printed.

3.0.0.1 HTML

HTML is used to structure content on the web.

<b>This text will be displayed in bold</b>

3.0.0.2 LaTeX

LaTeX is used for academic and technical documents.

\textbf{This text will be displayed in bold}

3.0.0.3 Markdown

Markdown is a lightweight markup language.

**This text will be displayed in bold**

Different programming languages allow us to execute code to generate results or perform tasks.

3.0.0.4 R

```{r}
library(ggplot2)
data.frame(country=c("Nigeria","Kenya","India"), prevalence=c(14.5,9.2,3.5)) |>
  ggplot(aes(x=country, y=prevalence)) +
  geom_bar(stat="identity", fill="steelblue")
```

3.0.0.5 Python

```{r}
library(reticulate)
Sys.setenv(RETICULATE_PYTHON = "C:/ProgramData/anaconda3/python.exe")
```
```{python}
import matplotlib.pyplot as plt
plt.bar(['Nigeria', 'Kenya', 'India'], [14.5, 9.2, 3.5], color='steelblue')
plt.show()
```

3.0.0.6 Observable JS

```{ojs}
BarChart({x: ["Nigeria", "Kenya", "India"], y: [14.5, 9.2, 3.5], yLabel: "Prevalence (%)"})
```

The output from executing code often results in visualizations or printed results. Below are the corresponding outputs for each language:


Call:  glm(formula = confirmed ~ age, family = binomial, data = df)

Coefficients:
(Intercept)          age  
   1.312275     0.001292  

Degrees of Freedom: 65668 Total (i.e. Null);  65667 Residual
Null Deviance:      66000 
Residual Deviance: 66000    AIC: 66000
c1 c2
setosa 5.1
setosa 4.9
setosa 4.7
setosa 4.6
setosa 5.0

Documents can be rendered in different type of outputs e.g., MS Word, PDF, HTML, PowerPoint, etc.

Pandoc is one of the tool that allows this conversion.

4 Existing tools for writing analytically reproducible documents

5 Quarto

  • Quarto is the successor to R Markdown, but is not tied to the R language.
  • Quarto files have a .qmd extension.

5.1 Source document

5.2 Rendered output

5.3 Quarto rendered outputs

  • Quarto documents can be rendered into to many report formats including HTML, Word document and many more
  • List of supported formats

5.4 Engines

Both knitr and Jupyter serve as engines to execute code embedded within a document, but they work in different programming environments.

This R package will read the code chunks, execute it, and ‘knit’ it back into the document. This is how tables and graphs are included alongside the text.

Jupyter is a popular engine for running Python code interactively. It supports multiple programming languages, but Python is the most common.

6 References

Reuse

CC-BY